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Abstract 



The study of balls-into-bins processes or occupancy problems has a long history. These pro- 
cesses can be used to translate realistic problems into mathematical ones in a natural -way. In 
general, the goal of a balls-into-bins process is to allocate a set of independent objects (tasks, 
^O ■ jobs, balls) to a set of resources (servers, bins, urns) and, thereby, to minimize the maximum 

^^ ' load. In this paper, we analyze the maximum load for the chains-into-bins problem, "which is 

• . defined as foUo-ws. There are n bins, and m objects to be allocated. Each object consists of 

fj ' balls connected into a chain of length £, so that there are m£ balls in total. We assume the 

chains cannot be broken, and that the balls in one chain have to be allocated to £ consecutive 
bins. We allo-w each chain d independent and uniformly random bin choices for its starting 
►^ . position. The chain is allocated using the rule that the maximum load of any bin receiving a 

\j^ ' ball of that chain is minimized. We sho-w that, for d > 2 and m ■ £ — 0{n), the maximum load 

is ((In In m)/ In d) + 0(1) with probability 1 - d{l/m'^-'^). 
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1 Introduction 

The study of balls-into-bins processes or occupancy problems has a long history. These models are 
commonly used to derive results in probability theory. Furthermore, balls-into-bins processes can 
be used as a means of translating realistic load-balancing problems into mathematical ones in a 
natural way. In general, the goal of a balls-into-bins process is to allocate a set of independent 
objects (tasks, jobs, balls) to a set of resources (servers, bins, urns). It is assumed that the balls 
are independent and do not know anything about the other balls. Each ball is allowed to choose 
a subset of the bins independently and uniformly at random (i.u.r.) in order to be allocated into 
one of these bins. The performance of these processes is usually analyzed in terms of the maximum 
load of any bin. 

One extreme solution is to allow each ball to communicate with every bin. Thus, it is possible to 
query the load of every bin and to place the ball into the bin that is least loaded. This allocation 
process always yields an optimum allocation of the balls. However, the time and the number of 
communications needed to allocate the balls are extremely large. The opposite approach is to allow 
every ball to communicate with only one bin. The usual model is for every ball to be thrown into 
one bin chosen independently and uniformly at random. For the case of m balls allocated to n 
bins i.u.r., it is well known that a bin that receives m/n + I y^{m\ogn)/n] balls exists with high 

probability (w.h.p.).^ An alternative approach which lies between these two extremes, is to allow 
every ball to select one of d > 2 i.u.r. chosen bins. The GREEDY[d] process, studied by Azar et 
al. [1], chooses d i.u.r. bins per ball, and the ball is allocated into the least loaded among these bins. 
For this process, and m > n, the maximum number of balls found in any bin, i.e., the maximum 
load, is m/n -|- Inlnn/lnd -|- 0(1) w.h.p. (see, for example, [1], [2]). Thus, even a small amount of 
additional random choice can decrease the maximum load drastically compared to a single choice. 
This phenomenon is often referred to as the "power of two random choices" (see [7] ) . 

In this paper we consider the chains-into-bins problem which can be regarded as a generalization of 
the balls-into-bins problem. We are given m chains consisting of i balls each. The balls of any chain 
have to be allocated to i consecutive bins. We allow each chain d i.u.r. bin choices, and allocate the 
chain using the rule that the maximum load of any bin receiving a ball of that chain is minimized. 
In this paper, we show that, for d > 2 and m-i = o{n ■ (In In ttt,)^' ^), the maximum load achieved by 
this algorithm is at most [Inln m / In d) + {{mi /n)'^) + 0{l), with probability 1 — 0((lnr7T,)'^/r7T,'^~^). 
This result shows that for a fixed number of balls, the maximum load decreases with increasing 
chain length. The maximum load depends on the number of chains only in the following sense. 
Allocating m = n/i chains of length i (with a total number of n balls) into n bins will, w.h.p., 
result in a maximum load of at most \nlia{n/i)/liad + 0{1). It follows that li i = 0((lnn)") for 
any a > 0, our result is asymptotically the same as that for allocating n/i balls into n/i bins using 
GREEDY[(i] protocol of [1]. 

We also prove that the naive heuristic that 'allocates the chain headers using GREEDY [d] and hopes 
for the best' performs badly for some values of m,i as one might expect. Indeed, if tti > In n and 
i > (ln?n,)/lnln?7T., then the maximum occupancy of this heuristic is at least (lnr7T,)/(21nln?7i), 
w.h.p. 



^A sequence of events An occurs with high probability if hin„^oo P{A„) — 1. 



Clustering. It can be seen that, provided we can make some extra assumptions, the results of [1] 
can be applied to the chains-into-bins problem for m chains of length i. Suppose we are allowed 
to cluster the bins into N = n/i clusters of i successive bins, and each chain can be allocated 
directly to one cluster. This is now equivalent to allocating m balls into N bins. Thus, we would 
get a maximum load of e((lnln Af)/ln((i) + 1) with GREEDY[d] and (}nln N)/d + 0(1) using 
the ALWAYS-GO-LEFT [d] protocol. However, this solution essentially ignores the model under 
consideration, and is equivalent in a hashing context to saying that we do not need to hash the 
data item at the given location, but rather somewhere in the next i cells at our convenience. If we 
have this freedom to ignore the locations we are given and pack the balls into N = n/i clusters of 
length i, then why not A^ = n/(2i) clusters of length 2i, placing the chains one after the other? 
Indeed why not arbitrary N, or even A^ = 1 and pack the chains cyclically in a round-robin fashion? 
That would be even more efficient. Obviously if such clustering were available it would be easier to 
organize the behavior of the balls. We assume henceforth that we have to put the chains where we 
are instructed, rather than where we would like to. Finally, we remark that, provided m = N = n/i, 
we can get the same results without restructuring the problem, and thus, the extra provisions are 
unnecessary. 



Applications. Our model can be viewed as a form of hashing in which the first data item of the 
chain is placed in the selected position of the hash table, and the remaining items overflow into the 
neighbouring positions of the table. 

The chains-into-bins problem has several important applications. One example is data storage on 
disk arrays, such as RAID systems (see [9]). Here, each data item is stored on several neighbouring 
disks in order to increase the data transfer rate. In this case, the bins model the disks from the 
storage array, and the chains model data requests which are directed to several neighbouring "bins." 
A second application is the scheduling of reconfigurable embedded platforms (see [4, 11]). Here, 
the tasks and the reconfigurable chip are modelled as rectangles with integral dimensions. All tasks 
have the same height but different length. The chip is modelled by a much larger rectangle that 
can hold several tasks in both dimensions. The goal is to allocate the tasks to a chip with a fixed 
length such that the required height is minimized. In this case, the tasks are modelled by the 
chains and the chip is modelled by the bins. The problem also models a scheduling problem where 
m allocated items persist in the system for i time steps. For example, imagine a train travelling in 
a circle with n station stops. The bins represent stations and the length of a chain represent the 
number of stops travelled by a passenger. 



1.1 Related Work 

Azar et al. [1] introduced GREEDY[(i] to allocate n balls into n bins. GREEDY[(i] chooses d 
bins i.u.r. for each ball and allocates the ball into a bin with the minimum load. They show that 
after placing n balls, the maximum load is 0((lnlnn)/ln(i), w.h.p. Compared to single-choice 
processes, this is an exponential decrease in the maximum load. For the case where m < n, their 
results can be extended to show a maximum load of at most (In Inn — lnln(n/r7T,))/lnd + 0(1), 
w.h.p. Vocking [13] introduced the ALWAYS-GO-LEFT [d] protocol, which clusters the bins into 
d clusters of n/d consecutive bins each. Every ball now chooses i.u.r. one bin from every cluster 



and is allocated into a bin with the minimum load. If several of the chosen bins have the same 
minimum load, the ball is allocated into the "leftmost" bin. The protocol yields a maximum load of 
(lnlnn)/(i + 0(l), w.h.p. In [5], Kenthapadi and Panigrahy suggest an alternative protocol yielding 
the same maximum load. They cluster the bins into 2n/d clusters of d/2 consecutive bins each. 
Every ball now randomly chooses 2 of these clusters and it is allocated into the cluster with the 
smallest total load. In the chosen cluster, the ball is then allocated into the bin with minimum 
load again. The authors also argue in that paper that clustering is essential to reduce the load 
to (lnlnra)/d + 0(1). In [2], the authors analyse GREEDY [d] ioi m ^ n. It is shown that the 
maximum load is m/n + Inln(n), w.h.p. Mitzenmacher et al. [8] show that a similar performance 
gain occurs if the process is allowed to memorize a constant number of bins with small load. 

In [10], Sanders and Vocking consider the random arc allocation problem, which is closely related 
to the chains-into-bins problem. In their model, they allocate arcs of an arbitrary length to a cycle. 
Every arc is assigned a position i.u.r. on the cycle. The chains-into-bins problem with d = 1 can 
be regarded as a special discrete case of their problem, where the cycle represents the n bins and 
the arcs represent the chains (in [10], different arc lengths are allowed). Translated into the chains- 
into-bins setting, the authors show the following result. If tti = n/i. chains of length I are allocated 
to n bins (m -^ oo), then the maximum load is at most (ln(n/^))/(lnln(n/£)), w.h.p. Note that 
their result is asymptotically the same as that for allocating n/i. balls into n/H. bins, provided that 
n/a. — 7- oo. In [3], the author shows that the expected maximum load is smaller if we allocate n/2 
chains of length 2 with one random choice per chain, compared to n balls into n bins with d = 2. 



2 Model and Results 

Assume m chains of length i are allocated i.u.r. to bins wrapped cyclically round 1, ...,n. A chain 
contains i balls linked together sequentially. The first ball of a chain is called header, the remaining 
balls comprise the tail of the chain. If chain i (meaning the header of chain i) is allocated to bin 
j, then the balls of the chain occupy bins j,j + l,...,j + i — l, where counting is modulo n. We 
define the h-load of a bin as the number of headers allocated to the bin. This is to be distinguished 
from the load of bin j. The load is the total number of balls allocated to bin j; that is, the number 
of chain headers allocated to bins j — i + 1, . . . ,j — l,j. 

We consider the case where each chain header randomly chooses d bins ji,...,jd. For random 
choice jfc it computes the maximum load of bins jk,jk + 1) • • • Jfc + ^ — 1. The chain header is 
allocated to the bin jk € ji,...,jd such that the maximum load is minimized. This allocation 
process is caUed GREEDY_CHAINS[d,£]. 

We show the following result, which is proved in Section 3. 

Theorem 1 Let m<n, £>l,d>2, and assume m- £ > n/(2e) and m- £ = o(n(ln In tti)^' ■^) . Let 
m chains of length I be allocated to n bins with d i.u.r. bin choices per chain header. The maximum 
load of any bin obtained by GREEDY_CHAINS[d,£] is at most 

In In 772 / /m ■ i^ 

Ind \\n 



with probability 1 — 0{{lnm)'^/m'^ ^). 

Note that when m-i = 0{n), GREEDY_CHAINS[d,^] achieves a maximum load of (lnlnm)/(lnd) + 
0(1), with high probabihty. In order to make a direct comparison, we extend of the results of [1] 
on the algorithm GREEDY[(i] to the case where m < n. 

Theorem 2 Assume that d >2, m < n, and that c is an arbitrary constant. Then, the maximum 
load achieved by GREEDYfdJ after the allocation of m balls is at most 

In In n — In ln{n/m) 

with a probability of 1 — 0{n^^), where s is a constant depending on c. 

Theorem 2 gives the bin load arising from chain headers (ignoring the rest of the chain). Since 
other collisions can occur, for example, between chain headers and internal links of the chain, this 
will always be a lower bound on the maximum load. Then, provided mi/n = 0(1), 

Inlnn — Inlnfn/m) ^, , , , lnln?7i ^, , 

, , + 0(1) < max load < -——- + 0(1). 

In a In a 

We see that, provided that i = e*-^""' (in particular, i is poly-logarithmic in n), the ratio of the 
upper and lower bounds on the maximum load is (1 + o(l)). 

Finally, suppose we allocate chain headers using GREEDY [d] but ignore the effect of this allocation 
on the rest of the chain. The following theorem, proved in Section 4, shows that this approach 
leads to a large maximum occupancy. 

Theorem 3 Assume that m ■ £ > n/{2e), that m < n/{2e), that m > \n n, and that i > 
(lnm)/(lnlnm). Then, the maximum occupancy of any bin based on GREEDY [d] allocation of 
chain headers is at least (lnm)/(21nlnm), w.h.p. 

The proofs of Theorem 2 and Theorem 3 can be found in Section 4. 



3 Analysis of GREEDY_CH AINS [d, £] 

In this section, we prove Theorem 1. The proof uses layered induction. In the case of GREEDY[d], 
Azar et al. [1] use variables ji as a high-probability upper bound on the number of bins with i or 
more balls, where 76 = n/2e and, for i > 6, 7^ = e • n • (7j_i/n)'^. 

Since we allocate chains into bins, we cannot consider only the number of bins with i or more chain 
headers, we have to consider both the chain headers and tails. Hence, to calculate the load of a 
bin, we have to consider the chain headers allocated to neighbouring bins. To do so, we define the 



set Si which can be thought of as the set of bins which will result in a maximum load of at least 
i + 1 if one of the bins in Si is chosen for a chain header. The set Si contains all bins j with load 
(at least) i and the bins at distance at most ^ — 1 in front of bin j. We emphasize that not all bins 
in Si have load of i themselves. We use variables /3j as high-probability upper bounds and show 
that, for i large enough, \Si\ < f3i = 2e ■ m ■ £ ■ (/3j_i/n)'*, w.h.p., in our induction. In the following, 
we define some sets and random variables that are used in our analysis. 

• Let Xj{t) be a random variable counting the h-load of bin j. That is, \j{t) is the number of 
chain headers allocated to bin j at (the end of) step t for t = 0,l,...,r7T,. 

• For given ^ C [n], define A^(t) = XligA '^j(^)- Thus, \A{t) is a random variable counting the 
total h-load of the bins in A at the end of step t. 

• Let Rj = {j — i + 1, ..., J — 1, j}, the set of bins that will increase the load of bin j if a chain 
header is allocated to them. 

• Let Lj{t) be a random variable counting the load of bin j at (the end of) step t. Thus, 
Lj{t) = Aj:jQ)(t), the load arising from the chain headers allocated to the i bins of R{j). 

• Let Qi{t) = {j : Lj{t) > i} be the set of labels of bins whose load is at least i at (the end of) 
step t. 

• Let Si{t) = Uj^Q.u\Rj. Thus, Si{t) contains the labels of bins such that an allocation of a 
chain header to one of these bins will increase the load of a bin with a load of at least i by 1 . 

• Let e>i{t) = \Si{t)\. 

• Let ht be a random variable counting the height of chain t. The algorithm GREEDY_CHAINS[d, j 
allocates the header to the bin which minimizes the maximum total load ht, where 

ht = 1 + min max {L,-_|_,fc(t — 1), k = 0, ...,i — 1} (1) 

i=l,...,d 

and ji, ■■■,jd are the bins chosen i.u.r. at step t. 

Our method of proving Theorem 1 uses an approach developed in [1], but incorporates the added 
complexity of considering the maximum load over the chain length. For consistency, we have 
preserved notation as far as possible. 

Let Q = mi/n with a > l/2e and k = [8a^e]. First, we show that i chains can contribute a block 
of bins of length at most 2^ — 1 to Si (m) . 

Lemma 1 For i > k, (1) 9>i{m) < 2mi/i, (2) 6>i{m) < n/2. 

Proof To prove part (1) we first consider a the following worst case scenario. Suppose at step t 

bin j contains i chain headers, bins j— ^+1, ..., j — 1 are empty, and bins j'+l, ...j+i—1 do not contain 
any chain headers. Then, {j,j + 1, ...j + £ - 1} C Qi{t), {j - i + 1, ...,j,j + 1, ...j + ^ - 1} C Si{t) 
and \Si{t)\ >2i-\. 



Now suppose that 15*4(^)1 = r and ask the question how many chain headers do we need for that. 
This number is minimized when the chain headers are ahgned, as demonstrated in our worst case 
example. Then, every set of i chain headers covers 2i — 1 bins. This means that we need 

r > 

- 2£-l 

chain headers. In general, for t < m, we get 

since 0>i{t) = \Si{t)\. For t = m, we have 

i ■ 6>i{rn 



21 



<>^S,{m){m)<m. (2) 



Part (2) follows from part (1). Since i > k, a = mi/n, and a > l/(2e), we get 

^ , , ^ . X 2mi 2mi 2mi n n 

e>i{m) < e>k{m) <—-<—^= = < 

- - k 8a^e 8 • (mi/n) ■ ae 4ae 2 



To prove Theorem 1, we define 

n ^ = 1, •••, k — 1; 



ft 



d 



2em£ ■ (^) i > k. 
For i > and j = k + i, it follows from the definition of ft that 



-1 



D 



ft = ft+. = n ■ ^^ = n . 2-'^' . (2ea)-('^"('^-2)+«/('^-^ (3) 

and, thus, provided 2ea > 1 (i.e., m£/n > l/2e), we have ft+j < n • 2~'^\ 

Define £i{t) = {9>i{t) < ft} and let 

Si = £i{m) = {e>^{m) < /3i} (4) 

be the event that [^^(t)! is bounded by ft throughout the process. From the discussion following (2), 
we have that S^ holds with certainty. Our goal is to obtain a value for i such that Pr(iSj) is close 
to 1 and, given £i, no bin receives more than i balls, with high probability. 

We next state a standard lemma (see Lemma 3.1 in [1]), proof of which is omitted. 

Lemma 2 Let Xi,X2, ■.■,Xm be a sequence of random variables with values in an arbitrary do- 
main, and let 11,12, •••,1m be a sequence of binary random variables with the property that Yt = 
Yt{Xi,...,Xt). If 

Fr{Yt = l\Xi,...,Xt^i)<p, 

6 



then 

Pr i^Yt>k] <Fr{B{m,p)>k), 

where B{m,p) denotes a binomially distributed random variable with parameters m and p. 
As the d choices of bins for a chain header are independent, we have that 

■ f \ d 



Pr{ht>i + 1\ e>i{t-l) = r)< 

~ \n 

where ht is given by Eq. (1). For chain t and integer i, let Y^' be an indicator variable given by 

YJ'^ = 1 ^^ {ht>i + l, e>^{t - 1) < ft} . 

Let Xi = {x\, . . . , xf) denote the set bin choices of ith chain header, and let Xi^t = (-'^i, •••, -'^t) be 
the choices of the first t chains. We define ^i,f as the event {^i,f = (^i, .■.■,Xt)}. 

Assume Xi^t-i £ ^lit — 1), meaning after the allocation of the first t — 1 chains, we have at most 
f3i bins that would result in a load of at least i + 1 when hit by chain t. Then, 

Pr(y« = 1 I Xi,,_i) < f ^' ' 

\ n 

and, if Xi^t-i ^ £i{t - 1), then Pr(y/*^ = 1 | Xi^t-i) = 0. Either way. 



Fr{Y;'> = l\Xi^t-i)< (^) =Pi. 
We can apply Lemma 2 to conclude that 

m 

Pr(^y/'^ > r) < Pr{B{m,pi) > r). (5) 



t=i 



r{i) 



Considering the extreme case discussed above in Lemma 1, we see that each event {Y^ =1} adds 
at most an extra 2^ — 1 bins to S'j+i(t). Thus for Xi^m G S-i-, 

m 

^>i+i(m)<2^^y/^\ (6) 

Let ri = e • m ■ Pi. Then, provided that X^J^x ^ ^fi,'^^ have 



0>;+i{m) < 2iri = 2£em ■ p, = 2eml • ( f) = A+i- (7) 



From (5) and (6), we have 



Pr (0>,+i(m) >2i-n ^») < Pr Y.^t > n 



U=i 



£r\ < 



PT{B{m,pi) > ri 



(8) 



Provided that m ■ pi > 21na; (where the precise value of oj is estabhshed below in (13)), using the 
Chernoff bounds, we get 

Pr(B(m,pi) > em ■ pi) < e"™'^' < -^. (9) 

Recall that Pr(-i(£'jfc) = 0. Assume inductively that Pr{-'£i) < i/uj'^, for i > k. Since 

Pr(-^,+i) < Pr{^£i+i I £^) ■ Pr(^,) + Pr(-£:,), 
we have, from (4), (7), (8), and (9), that 

i + 1 



Pr(-^^+l) < 



LJ 



2 ■ 



Choose i* as the smallest i such that Pi = [ — ) < — ^^. From (3), 

lnln(m/lna;) 



i* -k< 



Ind 



+ 0(1). 



(10) 



Also, as a = mi/n = o((lnln?TT-)^'^), we have that k = o{i*) so that the induction is not empty. 
Since pi* < (2\nuj)/m, we have that 



Pr(0>i.+i(m) > {2i) ■6lnLj\£i*) < 



< 



< 



Pr{B{m,pi*) > 61na;) 

Pr(^i*) 
Pr{B{m, (21nw)/m) > 61na;) 



Pr{£i,) 



1 



w2.Pr(£:,.; 



and, thus. 



Pr(^>i.+i(m) > (21) . 61ntj) < ^^ 



Pr(^^ 



Pr{£i,) + PT{^£i,) < {e + l)/w2_ ^^^^ 



Conditioned on 9>i*j^i{m) < 12-^lnw, the probability that a chain is placed at height at least i* + 2 
is at most (12£lna;/n)"'. Given that Y ~ B{m, {121 In co/n)'^), Pr{Y > 1) < m(12£lna;/n)'^ by 
Markov's Inequality. Thus applying Lemma 2, we get 



PrK:yf^^)>i 
\t=l 



?>i.+i(m) < (21) -Glnuj] < 



^.(^m^y 



Pr(6'>i.+i < 12£-lna;) 



(12) 



Let 00 satisfy 



m • in in 771 ^ 



^=( (Inm)^ ) • (''^ 



Using m ■ £ = o(n(lnln?7i)^'^), (13), (12) and (11), the probability that there is a bin with load at 
least i* + 2 is bounded by a term of order 



i* + l f (in ujy\ f (Inm) 



+ o^-r^ =o 



m'^ 1 / \ jjj(i 1 



By plugging in oj and k into (10), we get 

, ^ Inlnm „ , ,, ^, , 

Thus, w.h.p, no bin receives more than z* + 1 balls. D 



4 Allocating Chain Headers 

In this section we present the proofs of Theorem 2 and Theorem 3. 

4.1 Proof of Theorem 2 

This theorem can be shown similarly to the proof of Theorem 4 in [1]. We define 71 = 72 = • • • = 

75 = f^, 76 = "1-/(26), and 

7j = em • f — — 1 for i > Q. 

Thus 'ji = C n(m/(n2e))'^' for some C > 1 constant. Integer i* is defined as the smallest i such 
that em{'^i/n) < 6 Inn, which holds for 

Inlnn — lnln(n/r7T.) ^, , 

I < , r + O 1 • 

in a 

It can be shown that the maximum load is bounded by i* + 2 = (inlnn — lnln(n/?7i))/ln(i + 0(1), 
w.h.p. 



4.2 Proof of Theorem 3 

The idea of the proof is as follows. Let Um be the number of bins with load at least one after the 
allocation of m chains by GREEDY[(i] applied to the chain headers. We show that, with a good 
probability, there exists a strip of i consecutive bins which i) is used by one chain, and ii) at least 
t of its bins are in Um- This gives us a bin with load at least t. 



First, we find a lower bound on Um- Azar et al. [1] show that the protocol GREEDY[(i] for d > 1 is 
majorized by GREEDY[1] in the following sense. Let Xi be the load of the bin with the ith largest 
load after allocation of m balls with GREEDY[d], and let x[ be the load of the bin with the ith 
largest load after allocation of m balls with GREEDY[1]. It is shown in [1] that there exists a 
one-to-one mapping between the random choices of GREEDY[1] and GREEDY[(i] such that for all 

1 < J < ?^, 

j 3 

From this it follows that the number of empty bins in an allocation with GREEDY[d] is smaller 
that the one in GREEDY[1]. Let f{m) be the number of occupied bins in an allocation generated 
by GREEDY[1]. When m = n/(2e) we get 

E[f{m)] = n(l-(l--\ J > 0.9m. 

As m decreases, m — E[f{my\ decreases, too. Thus, provided m < n/(2e), the expected number of 
occupied cells with d > 1 choices per ball is always at least 0.9m. By concentration, we can assume 
Um > m/2 + 1, provided m, > In n. 

Given Um, we can assume that the locations of the bins occupied by chain headers are sampled 
uniformly without replacement from [1, . . . ,n]. We fix one of the m chains and consider the strip 
of i consecutive bins occupied by the chain. Assuming i > 2t, the probability of at least t bins 
occupied by additional chain headers in that strip is at least 

Um-A /1\* fm/2\ flV fmey [I 



<^'-a)^cr)-(^'.-a)K£^jK 



AeH 



as ml > n/(2e) and {i)t = i{i - 1) ■ ■ ■ (i - t + 1) > (i/eY. 

Let c = l/(4e^). Then, the expected number of chains allocated into a strip with load at least 
t = lnm,/(21nlnm) is at least 

m ■ (-] = exp {Inm — t In t/c) 

for large enough m. The probability that such an event does not occur tends to zero by the 
Chebychev's inequality. D 



5 Conclusions and Open Problems 

In this paper we analyse the maximum load for the chains-into-bins problem where m ■ i balls are 
connected in m chains of length i. We show that, provided m£ > n/2e and m£ = o(n(lnln?7i)^'^), 
the maximum load is at most '^jJ'J" + 0{{mi/n)'^), with probability 1 — 0((lnm) /m ). This 
shows that the maximum load is going down with increasing chain length. 
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Surprisingly, there are many open questions in the area of bahs-into-bins processes. Only very few 
results are known for weighted balls-into-bins processes, where the balls come with weights and the 
load of a bin is the sum of the weights of the balls allocated to it. Here, it is even not known if 
two or more random choices improve the maximum load, compared to the simple process where 
every ball is allocated to a randomly chosen bin (see [12]). Also, it would be interesting to get tight 
results for the maximum load and results specifying "worst-case" weight distributions for the balls. 
Something in the flavor "given that the total weight of the balls is fixed, it is better to allocate 
lots of small balls, compared to fewer bigger ones." Another interesting problem is to show results 
relating the maximum load to the order in which the balls are allocated. For example, is it always 
better to allocate balls in the order of decreasing ball weight, compared to the order of increasing 
ball weight? 

For chains-into-bins problem, an open question is to prove Knuth's [6] conjecture stating that 
breaking chains into two parts only increases the maximum load. This question still open for a 
single choice and also for several random choices per ball. See [3] for a first progress in this direction. 
Another question is if similar results to the one we showed in this paper for GREEDY[d] applied 
to chains also holds for the ALWAYS-GO-LEFT protocol from [13] applied on chains. 

Finally, we note that it would be interesting to generalize the problem to two dimensional packing, 
and consider online allocation of m objects of length i and width w to the cells of a toroidal grid 
of length n and width h. 
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